-
-
Notifications
You must be signed in to change notification settings - Fork 2.3k
Increase buffer size to speedup Image.tobytes()
#9220
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
Just to mention for anyone else reading this - individual users can already vary their individual experience by setting from PIL import ImageFile
ImageFile.MAXBLOCK = 65536 * 4 |
Have you tried using the Arrow interface, which is zero-copy? |
My use case still requires a numpy array as output (or at least access to the raw bytes). How would I do this from a user perspective with the Arrow interface? Currently I'm just using |
I would guess import numpy as np
import pyarrow as pa
from PIL import Image
img = Image.new("RGB", (12, 12))
np.array(pa.array(img)) but it doesn't seem faster to me. |
Here's a quick benchmark with this PR: import numpy as np
import pyarrow as pa
from PIL import Image
rng = np.random.default_rng(42)
for size in (128, 256, 512, 1024, 2048, 4096, 8192, 16384):
img = rng.integers(0, 256, size=(size, size, 3), dtype=np.uint8)
img = Image.fromarray(img)
print(f"{size}x{size}")
%timeit img.tobytes()
%timeit np.asarray(img)
%timeit pa.array(img)
%timeit pa.array(img).to_numpy(zero_copy_only=False)
For image sizes of 4096x4096 it also seems like pyarrow fails with So in summary using pyarrow arrays are much faster, but going from pyarrow to numpy is very slow. |
Just in case it is something interesting that we should consider in the future, could you explain this slightly more? |
The following code would raise img = Image.fromarray(np.random.randint(0, 255, size=(128, 128, 3), dtype=np.uint8))
arr = pa.array(img)
np_arr = pa.array(img).to_numpy() I haven't used pyarrow before, but judging from the docs it raises because "the conversion to a numpy array would require copying the underlying data (e.g. in presence of nulls, or for non-primitive types) " and since |
The main benefit of this PR is that since the buffer size is dependent on the image size users will still have low memory usage for small images but benefit from a larger buffer size for large images. Let me know if you have any concerns that would prevent merging of the PR. |
@radarhere Any updates on when/whether this PR could be merged? Or are there any additional benchmarks that you would like me to run? |
@lgeiger We'll probably need @wiredfool to look closer too. |
The original point of this particular bit of code was to have predictable memory usage when running So, where previously we needed 2xImageMemory + 64k, now we need 3xImageMemory. For smaller images, it's not a problem, but for larger images this may cause memory pressure where we didn't have it before. I'd consider that a regression. One alternate here is to change the calculation so that it's the min of max( There may other places where the |
@wiredfool Thanks for taking a look. I agree increasing The way I understand the code is the following: All chunks are appended to a list which is joined afterwards which causes the 2x ImageMemory usage on the Python side you mentioned above. Lines 798 to 808 in 6d6f049
I don't think the max memory usage would include the additional 64k buffer, but I haven't looked at what the C code is actually doing so I might be wrong. In any case, for this PR the output list only consists of a single item (assuming the buffer size was correct/large enough) preventing the need for allocating a new bytes object during the join. So the memory usage of the Python code would actually half. I thought about changing the code to directly return I double checked this with a memory profile and viewed the memory usage with import memray
import numpy as np
from PIL import Image
rng = np.random.default_rng(42)
def get_image(size):
return Image.fromarray(rng.integers(0, 256, size=(size, size, 3), dtype=np.uint8))
for size in (512, 1024, 2048, 4096, 8192, 16384):
img = get_image(size)
with memray.Tracker(f"pr_{size}.bin"):
img.tobytes() And the results show that this PR halves the memory usage which matches my theory from above:
|
Image.tobytes()
is used in the__array_interface__
when images are passed to numpy usingnp.asarray(...)
. Converting PIL images to numpy is very common, e.g. ML libraries like vllm or some pytorch dataloaders also commonly rely on this process.For large images this can be quite slow and can become a bottleneck since
.tobytes()
encodes the data in fixed chunks which need to be joined afterwards:Pillow/src/PIL/Image.py
Lines 798 to 808 in d42e537
This PR increases the buffersize to match the image size when using the default raw encoder instead of using a fixed value. In most cases this allows to encode the image in a single chunk which speeds up encoding of large images by over 2x:
Benchmarked with the following ipython script: